Hamming-like distances for ill-defined strings in linguistic classification
نویسندگان
چکیده
Ill-defined strings often occur in soft sciences, e.g. in linguistics or in biology. In this paper we consider `-length strings which have in each position one of the three symbols 0 or false, 1 or true, [ or irrelevant. We tackle some generalisations of the usual Hamming distance between binary crisp strings which were recently used in computational linguistics. We comment on their metric properties, since these should guide the selection of the clustering algorithm to be used for language classification. The concluding section is devoted to future work, and the string approach, as currently pursued, is compared to alternative approaches. ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗∗
منابع مشابه
Capacity of Bidirectional Associative Memory
The capacity of Bidirectional associative memory (BAM) was examined a lot in research, but not completely. In particular, this issue was not investigated in the context of strings coding. In this paper we apply different approaches to estimate the capacity of BAM for strings coding. One of these approaches is recalling of all coded strings. Another is applying Hamming and Levenshtein distances ...
متن کاملComputational dialectology in Irish Gaelic
Dialect groupings can be discovered objectively and automatically by cluster analysis of phonetic transcriptions such as those found in a linguistic atlas. The first step in the analysis, the computation of linguistic distance.between each pair of sites, can be computed as Levenshtein distance between phonetic strings. This correlates closely with the much more laborious technique of determinin...
متن کاملCapacity Inverse Minimum Cost Flow Problem under the Weighted Hamming Distances
Given an instance of the minimum cost flow problem, a version of the corresponding inverse problem, called the capacity inverse problem, is to modify the upper and lower bounds on arc flows as little as possible so that a given feasible flow becomes optimal to the modified minimum cost flow problem. The modifications can be measured by different distances. In this article, we consider the capac...
متن کاملEfficient Algorithms for Some Variants of the Farthest String Problem
The farthest string problem (FARTHEST STRING) is one of the core problems in the field of consensus word analysis and several biological problems such as discovering potential drugs, universal primers, or unbiased consensus sequences. Given k strings of the same length L and a nonnegative integer d, FARTHEST STRING is to find a string s such that none of the given strings has a Hamming distance...
متن کاملApproximate Regular Expression Matching
We extend the de nition of Hamming and Levenshtein distance between two strings used in approximate string matching so that these two distances can be used also in approximate regular expression matching. Next, the methods of construction of nondeterministic nite automata for approximate regular expression matching considering both mentioned distances are presented.
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2007